Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer
نویسندگان
چکیده
We address the challenging problem of Natural Language Comprehension beyond plain-text documents by introducing TILT neural network architecture which simultaneously learns layout information, visual features, and textual semantics. Contrary to previous approaches, we rely on a decoder capable unifying variety problems involving natural language. The is represented as an attention bias complemented with contextualized while core our model pretrained encoder-decoder Transformer. Our novel approach achieves state-of-the-art results in extracting information from answering questions demand understanding (DocVQA, CORD, SROIE). At same time, simplify process employing end-to-end model.
منابع مشابه
Document image understanding: geometric and logical layout
Document Image Understanding encompasses the technology required to make paper documents equivalent to other computer exchange media like oppies, tapes, and cdroms. The physical reader of the paper document is the scanner just like the physical reader of the oppy is the oppy drive and the physical reader of the tape cartridge is the tape cartridge drive, and the physical reader of the cdrom is ...
متن کاملGeometric Layout Analysis Techniques for Document Image Understanding: a Review
Document Image Understanding (DIU) is an interesting research area with a large variety of challenging applications. Researchers have worked from decades on this topic, as witnessed by the scientific literature. The main purpose of the present report is to describe the current status of DIU with particular attention to two subprocesses: document skew angle estimation and page decomposition. Sev...
متن کاملIntegrated Text and Image Understanding for Document Understanding
Because of the complexity of documents and the variety of applications which must be supported, document understanding requires the integration of image understanding with text understanding. Our docum(,nt understanding technology is implemented in a system called IDUS (Intelligent Document Undcrstanding System), which creates the da ta for a text retrieval application and the automatic generat...
متن کاملDocument Image Dewarping Based on Text Line Detection and Surface Modeling (RESEARCH NOTE)
Document images produced by scanner or digital camera, usually suffer from geometric and photometric distortions. Both of them deteriorate the performance of OCR systems. In this paper, we present a novel method to compensate for undesirable geometric distortions aiming to improve OCR results. Our methodology is based on finding text lines by dynamic local connectivity map and then applying a l...
متن کاملDocument Image Layout Comparison and Classification
This paper describes features and methods for document image comparison and classification at the spatial layout level. The methods are useful for visual similarity based document retrieval as well as fast algorithms for initial document type classification without OCR. A novel feature set called interval encoding is introduced to capture elements of spatial layout. This feature set encodes reg...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Lecture Notes in Computer Science
سال: 2021
ISSN: ['1611-3349', '0302-9743']
DOI: https://doi.org/10.1007/978-3-030-86331-9_47